Goto

Collaborating Authors

 census bureau


Supplemental Information for " Diverse Community Data for Benchmarking Data Privacy Algorithms " October 27, 2023 Supplemental Information Contents

Neural Information Processing Systems

SDNist are intended as tools to encourage investigation and discussion of deiden-tification algorithms, and they are not intended or suitable for product evaluation. The National Institute of Standards and Technology does not endorse any algorithm included in these resources.



Assessing the informative value of macroeconomic indicators for public health forecasting

Chakraborty, Shome, Khan, Fardil, Ghosal, Soutik

arXiv.org Machine Learning

Macroeconomic conditions influence the environments in which health systems operate, yet their value as leading signals of health system capacity has not been systematically evaluated. In this study, we examine whether selected macroeconomic indicators contain predictive information for several capacity-related public health targets, including employment in the health and social assistance workforce, new business applications in the sector, and health care construction spending. Using monthly U.S. time series data, we evaluate multiple forecasting approaches, including neural network models with different optimization strategies, generalized additive models, random forests, and time series models with exogenous macroeconomic indicators, under alternative model fitting designs. Across evaluation settings, we find that macroeconomic indicators provide a consistent and reproducible predictive signal for some public health targets, particularly workforce and infrastructure measures, while other targets exhibit weaker or less stable predictability. Models emphasizing stability and implicit regularization tend to perform more reliably during periods of economic volatility. These findings suggest that macroeconomic indicators may serve as useful upstream signals for digital public health monitoring, while underscoring the need for careful model selection and validation when translating economic trends into health system forecasting tools.


The Republican Plan to Reform the Census Could Put Everyone's Privacy at Risk

WIRED

The Republican Plan to Reform the Census Could Put Everyone's Privacy at Risk A little-known algorithmic process called "differential privacy" helps keep census data anonymous. President Donald Trump and the Republican Party have spent the better part of the president's second term radically reshaping the federal government. But in recent weeks, the GOP has set its sights on taking another run at an old target: the US census. Since the first Trump administration, the right has sought to add a question to the census that captures a respondent's immigration status and to exclude noncitizens from the tallies that determine how seats in Congress are distributed. In 2019, the Supreme Court struck down an attempt by the first Trump administration to add a citizenship question to the census. But now, a little-known algorithmic process called "differential privacy," created to keep census data from being used to identify individual respondents, has become the right's latest focus.


Supplemental Information for " Diverse Community Data for Benchmarking Data Privacy Algorithms " October 27, 2023 Supplemental Information Contents

Neural Information Processing Systems

SDNist are intended as tools to encourage investigation and discussion of deiden-tification algorithms, and they are not intended or suitable for product evaluation. The National Institute of Standards and Technology does not endorse any algorithm included in these resources.


32e54441e6382a7fbacbbbaf3c450059-Supplemental.pdf

Neural Information Processing Systems

We only included a candidate variable if the nearest neighbor match was exact, i.e., we could find We compared the "fnlwgt" data to all weight variables "UH_WGTS_A1", which has a similar distribution. Since we did not identify an exact match for "fnlwgt" and the variable is not a property of an individual, we do not utilize it further in We vary the threshold from 6,000 to 72,000. Concretely, for a given threshold, e.g. In our experiments, as the "unconstrained" base classifier, we use the gradient boosted decision tree B.1 ACSIncome Predict whether US working adults' yearly income is above $50,000. T arget: PINCP (Total person's income): an individual's label is 1 if PINCP > 50000, otherwise 0. ACS PUMS data differently, and construct a new prediction task. Features: AGEP (Age): Range of values: - 0 - 99 (integers) - 0 indicates less than 1 year old.



DeSIA: Attribute Inference Attacks Against Limited Fixed Aggregate Statistics

Mao, Yifeng, Stevanoski, Bozhidar, de Montjoye, Yves-Alexandre

arXiv.org Artificial Intelligence

Empirical inference attacks are a popular approach for evaluating the privacy risk of data release mechanisms in practice. While an active attack literature exists to evaluate machine learning models or synthetic data release, we currently lack comparable methods for fixed aggregate statistics, in particular when only a limited number of statistics are released. We here propose an inference attack framework against fixed aggregate statistics and an attribute inference attack called DeSIA. We instantiate DeSIA against the U.S. Census PPMF dataset and show it to strongly outperform reconstruction-based attacks. In particular, we show DeSIA to be highly effective at identifying vulnerable users, achieving a true positive rate of 0.14 at a false positive rate of $10^{-3}$. We then show DeSIA to perform well against users whose attributes cannot be verified and when varying the number of aggregate statistics and level of noise addition. We also perform an extensive ablation study of DeSIA and show how DeSIA can be successfully adapted to the membership inference task. Overall, our results show that aggregation alone is not sufficient to protect privacy, even when a relatively small number of aggregates are being released, and emphasize the need for formal privacy mechanisms and testing before aggregate statistics are released.


Toward Equitable Access: Leveraging Crowdsourced Reviews to Investigate Public Perceptions of Health Resource Accessibility

Xue, Zhaoqian, Liu, Guanhong, Wei, Kai, Zhang, Chong, Zeng, Qingcheng, Hu, Songhua, Hua, Wenyue, Fan, Lizhou, Zhang, Yongfeng, Li, Lingyao

arXiv.org Artificial Intelligence

Access to health resources is a critical determinant of public well-being and societal resilience, particularly during public health crises when demand for medical services and preventive care surges. However, disparities in accessibility persist across demographic and geographic groups, raising concerns about equity. Traditional survey methods often fall short due to limitations in coverage, cost, and timeliness. This study leverages crowdsourced data from Google Maps reviews, applying advanced natural language processing techniques, specifically ModernBERT, to extract insights on public perceptions of health resource accessibility in the United States during the COVID-19 pandemic. Additionally, we employ Partial Least Squares regression to examine the relationship between accessibility perceptions and key socioeconomic and demographic factors including political affiliation, racial composition, and educational attainment. Our findings reveal that public perceptions of health resource accessibility varied significantly across the U.S., with disparities peaking during the pandemic and slightly easing post-crisis. Political affiliation, racial demographics, and education levels emerged as key factors shaping these perceptions. These findings underscore the need for targeted interventions and policy measures to address inequities, fostering a more inclusive healthcare infrastructure that can better withstand future public health challenges.


The 2020 United States Decennial Census Is More Private Than You (Might) Think

Su, Buxin, Su, Weijie J., Wang, Chendi

arXiv.org Machine Learning

The U.S. Decennial Census serves as the foundation for many high-profile policy decision-making processes, including federal funding allocation and redistricting. In 2020, the Census Bureau adopted differential privacy to protect the confidentiality of individual responses through a disclosure avoidance system that injects noise into census data tabulations. The Bureau subsequently posed an open question: Could sharper privacy guarantees be obtained for the 2020 U.S. Census compared to their published guarantees, or equivalently, had the nominal privacy budgets been fully utilized? In this paper, we affirmatively address this open problem by demonstrating that between 8.50% and 13.76% of the privacy budget for the 2020 U.S. Census remains unused for each of the eight geographical levels, from the national level down to the block level. This finding is made possible through our precise tracking of privacy losses using $f$-differential privacy, applied to the composition of private queries across various geographical levels. Our analysis indicates that the Census Bureau introduced unnecessarily high levels of injected noise to achieve the claimed privacy guarantee for the 2020 U.S. Census. Consequently, our results enable the Bureau to reduce noise variances by 15.08% to 24.82% while maintaining the same privacy budget for each geographical level, thereby enhancing the accuracy of privatized census statistics. We empirically demonstrate that reducing noise injection into census statistics mitigates distortion caused by privacy constraints in downstream applications of private census data, illustrated through a study examining the relationship between earnings and education.